You may talk to a friend, discuss the questions and potential directions for solving them. However, you need to write your own solutions and code separately, and not as a group activity.
Do not write your name on the assignment.
Write your code in the Code cells and your answer in the Markdown cells of the Jupyter notebook. Ensure that the solution is written neatly enough to understand and grade.
Use Quarto to print the .ipynb file as HTML. You will need to open the command prompt, navigate to the directory containing the file, and use the command: quarto render filename.ipynb --to html. Submit the HTML file.
The assignment is worth 100 points, and is due on 13th October 2022 at 11:59 pm.
13 1
13.1 1(a)
Air quality sensors are used to measure the amount of contaminants in air. This question will guide you in finding the location of installing 50 air quality sensors in the State of Colorado, such that they are as far away from each other as possible. The approach below is a greedy algorithm to find an approximate Maximin design.
The file colorado_coordinate_grid.txt contains the coordinate-pairs (latitude and longitude) of potential locations for installing an air quality sensor.
Read the file with NumPy. How many coordinate-pairs are there in the file?
Note that:
A coordinate-pair means a latitude-longitude pair.
‘Air quality sensor’ will be referred as ‘sensor’ in the questions below for brevity.
(4 points)
13.2 1(b)
The first sensor is to be installed closest to Denver (closest in terms of Euclidean distance). Find the coordinate-pair of the location where the first sensor will be installed. The coordinate-pair of Denver is: [39.7392\(^{\circ}\) N, 104.9903\(^{\circ}\) W]
Note that the suffixes \(^{\circ}\) N and \(^{\circ}\) W are omitted in the file colorado_coordinate_grid.txt.
Hint: Broadcasting
(4 points)
13.3 1(c)
Find the coordinate-pair of the installation of the next sensor, such that it is as far as possible from the first sensor installed near Denver.
Hint: Broadcasting
(4 points)
13.4 1(d)
Stack the coordinate-pairs of the first and second sensors vertically to obtain a 2 x 2 NumPy array. Name the array as air_sensor_coordinates.
Run the code below to check if your results seem correct. The coordinate-pairs of the two air quality sensors will be marked as blue dots.
Now you need to find the coordinate-pair for installing the third sensor such that it is far away from the two already-installed sensors. Proceed as follows:
Find the minimum distance of each coordinate-pair in colorado_coordinate_grid.txt from the two already installed sensors. For example, if a coordinate-pair is at a distance of 5 units from the first sensor, and 10 units from the second sensor, then its minimum distance from the sensors will be \(\min(5,10) = 5\) units.
Select the coordinate-pair (from colorado_coordinate_grid.txt) whose minimum distance from the two already installed sensors is the maximum.
Stack the coordinate-pair of the third air quality sensor vertically on the array air_sensor_coordinates.
Call the function sensor_viz() to check if your results seem correct. The coordinate-pairs of the three air quality sensors will be marked as blue dots.
Hint:
For step (1) above:
Define a function which computes the distances of a coordinate-pair from all the coordinates of air_sensor_coordinates, and returns the minimum distance.
Apply the function on all the coordinate-pairs in colorado_coordinate_grid.txt using the NumPy function apply_along_axis().
(25 points)
13.6 1(f)
You need to find 47 more coordinate-pairs to install air quality sensors well-spread across Colorado. We will generalize the steps in 1(e)-5 to proceed as follows:
Suppose you have already found the coordinate-pairs for the installation of i sensors.
Find the minimum distance of each coordinate in colorado_coordinate_grid.txt from the i already installed sensors. For example, if a coordinate-pair is at a distance of \(d_1\) from the first sensor, \(d_2\) from the second sensor,…, and \(d_i\) from the \(i^{th}\) sensor, then its minimum distance from the sensors will be \(min(d_1, d_2, ..., d_i\)).
Select the \(i+1^{th}\) coordinate-pair (from colorado_coordinate_grid.txt) as the one whose minimum distance from the \(i\) already installed sensors is the maximum.
Call the function sensor_viz() to check if your results seem correct. You should see 50 blue dots well spread across Colorado.
(10 points)
14 2
When the monthly sales of a product are subject to seasonal fluctuations, a curve that approximates the sales formula might have the form:
\[y = a + b*x + c*\sin\bigg(2*\pi*\frac{x}{12}\bigg),\]
where \(x\) is the time since the starting point in months and \(y\) is the monthly sales in USD (million). The term \(a + b*x\) gives the basic sales trend and the \(\sin\) term reflects the seasonal changes in sales. Suppose the model parameters (i.e., \(a\), \(b\), and \(c\)) are estimated and put on the list below for the sales of a certain brand of sunscreen starting June 1, 2017.
Code
model_parameters = [2, 5, 18]
Then, the total monthly sales in June 2017 will be calculated by plugging 1 as \(x\) into the equation.
Using matrix multiplication with NumPy, we wish to estimate the total sales between June 1 2017 and March 1, 2020. (So many models failed to predict sales after that - probably due to covid.)
Proceed as follows.
14.1 2(a)
Create a numpy array where the first column is all \(1\)s, the second column is a range of numbers from 1 to the total number of months from June 1 2017 to March 1 2020 and the third column is \(\sin(2*\pi*x/12)\) values with \(x\) values as plugged-in in the second column.
(10 points)
14.2 2(b)
Create an array from the list model_params.
(3 points)
14.3 2(c)
Use matrix multiplication to get the monthly sales estimates for each month in the range: June 1 2017 and March 1, 2020.
(8 points)
14.4 2(d)
Find the total sales between June 1 2017 and March 1, 2020.
(3 points)
15 3
This problem demonstrates the benefit of generating pseudo random number matrix with NumPy.
The list exercise_minutes below consists of exercise minutes per week of the students of STAT303-1 Fall 2022 class.
We wish to find the 95% confidence interval of mean exercise_minutes, using Bootstrapping.
Bootstrapping is a non-parametric method for obtaining confidence interval. The method is as follows.
Suppose the list exercise_minutes has \(N\) values.
Randomly sample \(N\) values with replacement from exercise_minutes
Find the mean of the \(N\) values obtained in (b)
Repeat steps (b) and (c) 10,000 times
The 95% Confidence interval is the range between the 2.5% and 97.5% percentile values of the 10,000 means obtained in (c)